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I. INTRODUCTION 



Finite state space semi-Markov models Find application in a variety of areas such 
as queueing theory, reliability, and clinical trials [Refs. 1,2,3]. The application of these 
models often centers on the distribution of a first-passage time to a state or a set of 
states representing for example the lifetime of a system or the end of a busy period of a 
server. Suppose that the observations of the path of the semi-Markov process are all 
that is known about the process. 

In a number of these areas, data arise that are censored. This happens 
frequently, for instance, when fitting lifetime distributions either in medicine or in the 
field of industrial quality control. In medicine, one might be measuring the amount by 
which some new drug extends the life of terminally ill patients. A certain number of 
patients are still alive at the end of the experiment, so we do not know how much their 
lives have been extended overall, and certain others might have died of unrelated 
causes or have been removed from treatment prematurely. In quality control one 
might be measuring the distribution of time-to-failure for a sample of integrated circuit 
chips under conditions that accelerate aging. Again, many of the chips may not have 
failed by the end of the trial, while others may have failed at the very beginning due to 
manufacturing defects unrelated to the mechanisms which cause failures in the long 
run. 

This thesis reports the results of a simulation experiment to compare various 
parametric and nonparametric estimates of the distribution of a first-passage time for a 
particular semi-Markov process with censoring. The specific simulation model and 
estimates considered are given in Chapter 2. Chapter 3 contains the details of the 
simulation experiment and results. Conclusions from the study are given in Chapter 4. 
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II. NATURE OF THE PROBLEM 



A. PROBLEM 



Suppose we observe N individuals. Let X t (i) be the state of the i* individual at 



time t. We will assume {X t (f), t>0) i= 1, 2, N, are independent identically 
distributed semi-Markov processes with three states {0,1,2}. The individuals start at 
t = 0 in state 1. Upon leaving state 1, they transition to state 0 with probability 0 and 
to state 2 with probability 1-0. From state 2, transition is to state 1 with probability 1. 
State 0 is an absorbing state. The sojourn time in state i has a distribution function F; 
(i = 1,2). The individuals are censored . independently. The censoring times are 
exponentially distributed with mean 1/c. The entire path of transitions and sojourn 
times are observed until the time of censoring, if any. Let D be the first entrance time 
to state 0. The problem is to estimate the survival distribution P{D > /} with the 
censored data of the N individuals. 

B. ESTIMATORS 

Four estimators for P{D > /} will be described in this section. The First being 
the Kaplan-Meier estimate [Ref. 4], and the others are Maximum Likelihood, Renewal 
Equation, and Asymptotic Renewal estimates from a paper by P. A. Jacobs [Ref. 5]. 

1. Kaplan-Meier Estimate 

One nonparametric estimate for censored data is the product limit estimate. 
Let Uj, U 2 , L' n be independent identically distributed random variables with 
distribution G. Let Vj, V 2 , ..., V n be independent identically distributed times to 
censorship. Let 



Z; = min (U;,^) 



(eqn 2.1) 



and 




(eqn 2.2) 



Let 51 ^(2) — ••• — Z^ be the order statistics of {Z;} and 5^ be the 

corresponding order statistic of {5;}. The Kaplan-Meier estimate of G(t) is 





n c(i)'- 8 (i) 


ift < z (n) 




• 


VI 

n" 






0 


ift > Z (n) &5 (n) = 0 


(eqn 2.3) 




^Undefined 


ift > Z (n) &6 (n) = 1 





where 

C(i) = (n-i)/(n-i+ 1) (eqn 2.4) 

[Ref. 4:p. 464] and G(t) = l-G(t). If there isn't any censoring, then the product limit 
reduces to the binomial estimate for each t. This estimate applied to the data of the 
passage times to state 0 for the N individuals will be referred to as the Kaplan-Meier 
estimate of the distribution of the first passage time to state 0 and denoted as 
P k (t) = P k {D>t}. 

2. Maximum Likelihood Estimate 

In this subsection, the maximum likelihood estimate will be given for the 
special case when the sojourn time in state / is exponentially distributed with mean 1/p; 
(i=l,2). 

Let R.. be the number of transitions from state i to j for one individual. The 

ij J 

log likelihood function for the individual is 

*£?= R 12 ln(l-0)+R, o ln 0+R 21 ln p 2 + (R 10 + R 12 ) In Pj - p;Tj - p 2 T 2 (eqn 2.5) 

where T; (i = 1,2) is the total time spent in state / before entrance into state 0 or 
censoring [Ref. 5:p. 2]. The maximum likelihood estimators are 

G — R 10 /(R 1q +R 12 ) (eqn 2.6) 
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(eqn 2.7) 



Pi (R 10 +R 12 )/Tj 



p 7 r 21 /t 2 . 



(eqn 2.8) 



The maximum likelihood estimate for the survival distribution is 
[Ref. 5:p. 5 eqn 1.17] 



P {D> t} * {Opj/C^i-^JfK^ + p^/X^expltX^-KXj-l-p^AjexpttXj} (eqn 2.9) 



where Xj and X 2 are the roots of the equation 



AA /\ 



A A 



epiP 2 + y(p'i + p 2 ) + y 2 = o . 



(eqn 2.10) 



The above estimate will be referred to as the parametric estimate and denoted as 
P p (0 s P p (D>t}. 

3. Renewal Equation Estimate 

The probability P{D > /} satisfies the renewal equation 

P(D>t} = F 1 (t) + (l-0)J t o F 1 (ds)F 2 (t-s) + (l-0)J t o (F 1 *F 2 )(ds)P{D>t-s} (eqn 2.11) 

where R is the distribution of the sojourn time in state i, F. (t) = 1-Fj (t), and Fj*F 2 is 
the convolution of Fj and F 2 . 

The solution to the renewal equation 2.11 is 

P{D > t) = g(t) + f Q R(ds) g(t-s) (eqn 2.12) 



where 



g(t) = Fj(t) + (1-0) f Q F j(ds) F 2 (t-s) 



(eqn 2.13) 
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and 



R(t) = I (1-B) n (Fj*F 2 ) n *(t) (cqn 2.14) 

/n - 1 

where (Fj*F 2 ) n (t) denotes the n-fold convolution of (F 1 *F 2 ) with itself at time t. 

A nonparametric estimate for P{D > t} can be obtained by replacing F ; by its 
Kaplan-Meicr estimate and 0 by its maximum likelihood estimate in equation 2.12. If 
the largest sojourn time in state / is censored then the Kaplan-Meier estimate of F. is 

A 1 

not an honest distribution function (F ; (co)< 1) since the estimate is undefined past the 
largest sojourn time. In this case the dishonest distribution estimate is used in all the 
remaining computations which will give a conservative estimate of the survival 
distribution. 

An approximation to equation 2.12 can be found by using a discrete time 
approximation to R(t) as follows. Let 5 > 0 be a constant and let 

p n (S) = (1-6) {[F^F 2 ](n8) - [F^F^n-l^)} . (eqn 2.15) 

Recursively approximate R(t) as follows 

R a (0) = 0 (eqn 2.16) 

R a (S) = Pl (5) 

R a (n6) = £ p k (8) + I Pk (8) R a ((n-k)5) . 

fe-/ 



An approximation to the solution of equation 2.12 using estimates of F ; and 6 is 

a (£) 

P r (D>t} = g(t) + S (R a (k8) - R a ((k-1)5)} g(t-k5) (eqn 2.17) 

few 



where n(8) is the largest integer less than t/5 [Ref. 5:p. 9 eqn 2.9]. If the number of 
individuals N or the time t are large, the estimate of equation 2.17 may require a large 
number of additions of small non-negative numbers. This estimate will be referred to 

A A 

as the renewal estimate and denoted as P f (t) = P f {D>t}. 
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4. Asymptotic Renewal Estimate 

A A 

Let Fj be the Kaplan-Meier estimate of F ; and 0 be the maximum likelihood 
estimate of 0; then define 



<Pj(^) = exp[s^] Fj(ds) 



(eqn 2.18) 



where again F ; may be a dishonest distribution due to censoring of the last sojourn 
time in state i. The asymptotic renewal estimate of the survival distribution is 
[Ref. 5:p. 1 1 eq. 3.11] 



p ( D> t} = exp[tK] (b/fi) 



(eqn 2.19) 



where K is the solution to the equation 



(1-0)$j(k)$ 2 (k) = 1 



(eqn 2.20) 



and 



F = (1-6) J^°exp[sK] s (Fj*F 2 )(ds) 



(eqn 2.21) 



and 



A £ A yy K 

b = (6/k) cpj(K) 



(eqn 2.22) 



A 

The K for equation 2.19 was found by numerical search using equations 2.18 and 2.20. 
The above estimate will be referred to as the asymptotic estimate and denoted as 
P a (t) ^ P a {D>t). 

A 

If P f {D>t} were exactly the solution of the equation 2.12 with the 
Kaplan-Meier estimate of F ; and the maximum likelihood estimate of 0 being used 
then 



P r {D>t}/P a {D>t} ~ 1 



(eqn 2.23) 



as r->co in the case where the Kaplan-Meier estimates are honest distributions. 
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III. ANALYSIS OF THE PROBLEM 



A. SIMULATION 

A Fortran program is written to generate and analyze the data for this problem. 
All simulations are carried out on an IBM 3033AP computer at the Naval 
Postgraduate School using the LLRANDOM II random number generating package 
[Ref. 6]. The data for the simulation experiments are generated as follows: 
Independent exponential censor times with mean 1/c are generated for each individual. 
The individual starts in state 1 at t=0 and an exponential time with mean l/p 1 is 
generated for the sojourn time. A comparison between the sojourn and censor time is 
done; if the sojourn time is smaller, then the sojourn time is recorded; if the censor time 
is smaller, the truncated sojourn time and the censored death time are recorded. From 
state 1, if not censored yet, a uniform random number is compared to theta; if less 
than theta, the process jumps to state 0 and the uncensored death time is recorded; if 
greater than theta, the process jumps to state 2 and an exponential sojourn time with 
mean l/p 2 is computed. The total time (sojourn times in state 1 plus sojourn times in 
state 2) is compared to the censored time; with the same actions as listed above. From 
state 2, the process jumps to state 1 and continues until an uncensored or censored 
death occurs. The times are recorded and the next individual is started. This continues 
until all N individuals have been generated. The data in each state is sorted in 
increasing order for ease of program manipulations. If N is small, it is possible for all 
the sojourn times in a state to be censored or for all the first passage times to state 0 to 

A A A 

be censored which results in P r (t), P a (t), or P k (t) being undefined for all t. In these 
cases the replication is dropped and a new replication generated. 

A sample data set is listed below for N = 10. The first row under state 1 and state 
2 gives each particular censored or uncensored sojourn time that is generated for that 
state. Under each sojourn time, the binary number indicates whether the individual is 
censored (1) or not (0) during that sojourn time. State 0 indicates times of death 
(passage time to state 0), and whether censored (1) or not (0); note that the times 
indicate either the time of death (not censored) or the time of censoring (censored 
death time). The sojourn and death times listed below have been sorted, along with its 
associated censor indicator. 
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State 2 



0.1629 0.2041 2.2201 
0 0 0 

State 1 

0.1356 0.1615 0.2114 0.2748 0.2996 0.3067 0.3450 0.3725 0.3996 0.4305 
1 1 1 0 0 0 1 0 0 1 

0.8676 1.1980 2.4630 
0 0 0 

State 0 N=10 

0.1356 0.1615 0.2748 0.3450 0.8676 0.8832 0.9930 1.1980 2.4630 2.7312 
1 1 0 1 0 0 1 0 0 1 



Using equations 2.3, 2.9, 2.17, and 2.19, estimates of the survival distribution 
P{D> t} from the data are calculated from subroutines in the Fortran program. Output 
from the program produces a tabic like the one below that includes: time, actual 

A 

survival probability (ACT(t)), parametric estimate (P {D>t}), renewal estimate 

\ A ^ 

(P {D>t}), asymptotic estimate (P {D>t}), and the Kaplan-Meier estimate of the first 

r A 3 

passage time to state 0 (P k {D>t}). The actual survival probability ACT(t) is 
computed using equations 2.9 and 2.10 with the actual parameter values instead of the 
estimated values. The Kaplan-Meier estimate uses only the uncensored first passage 
times to state 0. Output in Table I is for the data set listed above. 

In Table I, the renewal and asymptotic estimates decrease as t increases. In this 
case, the largest sojourn times in both state 1 and state 2 are uncensored. To 
demonstrate what can happen when the largest sojourn times are censored, Table II 
shows a case where the largest sojourn times in state 1 and state 2 are censored. 
Notice that after t = 5 there is little change in the renewal estimate. The survival 
probability levels off and becomes constant. The asymptotic estimate starts low (half 
the probability) and goes to zero just after t=5. In a third case, when either of the 
largest sojourn times in state 1 or state 2 are censored, the effects are somewhere 
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between the two cases mentioned above; the renewal estimate starts to level ofF but 
may not become constant and the asymptotic estimate starts lower than normal and 
may go to zero. The dishonest Kaplan-Meier estimate of F. has a definite affect on 

A A 1 

P f (t) and P a (t) for large t. 

TABLE I 

OUTPUT FROM PROGRAM 



Survival Probability P{D> t} 



Time 


ACT(t) 


P p {D>t} 


P r {D>t} 


P a (D>t} 


P k {D>t} 


.5 


0.79965 


0.73641 


0.66667 


0.68473 


0.87500 


1.0 


0.66340 


0.56522 


0.52606 


0.54348 


0.58333 


2.0 


0.47996 


0.35318 


0.36158 


0.34238 


0.38S89 


5.0 


0.19737 


0.09549 


0.09181 


0.08560 


Undefined 


7.0 


0.10985 


0.04027 


0.03451 


0.03397 


Undefined 


10.0 


0.04563 


0.01103 


0.00874 


0.00849 


Undefined 


12.5 


0.02194 


0.00375 


0.00272 


0.00268 


Undefined 


15.0 


0.01055 


0.00128 


0.00086 


0.00084 


Undefined 



Time 


TABLE II 

OUTPUT FROM PROGRAM 
Survival Probability P{D>t} (largest sojourn censored) 
ACT(t) P p {D> t} ?{D>t} P a {D>t} 


P k {D>t} 


.5 


0.79965 


0.79130 


0.78783 


0.35320 


0.90000 


1.0 


0.66340 


0.65115 


0.65547 


0.16873 


0.78750 


2.0 


0.47996 


0.46358 


0.52706 


0.03851 


Undefined 


5.0 


0.19737 


0.18049 


0.49071 


0.00046 


Undefined 


7.0 


0.10985 


0.09675 


0.49020 


0.00002 


Undefined 


10.0 


0.04563 


0.03797 


0.49017 


0.00000 


Undefined 


12.5 


0.02194 


0.01742 


0.49017 


0.00000 


Undefined 


15.0 


0.01055 


0.00799 


0.49017 


0.00000 


Undefined 
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B. ANALYSIS 



For the simulated model described above, parameter values of Pj = l, p 2 = 1 , 
0 = 0.5, and c = 0.5 are used. The simulation uses two different numbers of observed 
individuals. The number of individuals is set at 10 and 50, representing a low and 
moderate number of subjects. The simulation is replicated 500 times utilizing different 
seeds to generate the data. The average relative bias for each estimate is computed by 

(A 

ARB(t) = (1/M) £ (EST.(t)-ACT(t))/ACT(t) (eqn 3.1) 

j-'i 

where EST(t) is the value of an estimate computed for the i 1 * 1 replication at time / and 
ACT(t) is the actual model value at time t. For the Kaplan-Meier estimate, M is taken 
as the number of Kaplan-Meier estimates of the first passage time to state 0 still 
defined by time t. For the other estimates, M is the number of replications (500). 

The figures below show histograms of the relative bias of the observations 
(ESTj(t)-ACT(t))/ACT(t). Figure 3.1a shows histograms of the relative bias for each of 
the four estimates when N= 10 and at t=0.5. Each of the histograms looks relatively 
normal with possibly a slight skew to the left. The parametric estimate has the tightest 
distribution and the asymptotic estimate the worst which is expected since the 
asymptotic properties are for large /. Figure 3.1b shows the relative bias for each 
estimate when N= 10 and (=5.0. The parametric is somewhat normal but skewed to 
the right. The renewal estimate looks a little less skewed. The asymptotic estimate is 
skewed to the right and looks exponential. At time (=5.0, less than half of the 
Kaplan-Meier estimate of the first passage time to state 0 are defined. The histogram 
of the defined Kaplan-Meier estimate is starting to show an accumulation of mass at 
-1.0 which is the value of the relative bias where the largest passage time observation is 
uncensored and less than 5.0. 
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Figure 3.1a Histograms of relative bias for N= 10 and t = 0.5. 
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Figure 3.1b Histograms of relative bias for N= 10 and t=5.0. 
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RELATIVE BIAS AT T-5.0 RELATIVE BIAS AT T-5.0 



Table III shows the ARB(t) of the estimates for the case when N= 10 individuals 
and Table IV for the case when N=50 individuals. The ARB(t) for each estimate is 
given for selected values of t. Along with the ARB(t) in the parentheses is the 
corresponding standard error. The standard error is computed by taking each 
observation of the relative bias (EST;(t)-ACT(t))/ACT(t) and subtracting the ARB(t), 
squaring this and summing over all M observations, then dividing by M-l. This 
produces the distribution variance, which is divided by M and the square root taken of 
to get the standard error of the ARB(t) for each estimate at time t. The variance 
together with the average relative bias can be used to obtain an estimate of the relative 
mean squared error of the estimate. The right most column of the Tables III and IV 
gives the number of replications out of 500 that still has defined Kaplan-Meier 
estimates of the distribution of the first passage time to state 0 by time t. 



TABLE III 

AVERAGE RELATIVE BIAS 
Exponential Model N= 10 (500 Reps) 



Time 


P> 


P r (t) 




P k (0 


#KM 


.5 


-.00183 

(.00533) 


.01292 

(.00545) 


-.18069 

(.01287) 


.01448 

(.00743) 


500 


1.0 


.00575 

(.00942) 


.02788 

(.00917) 


-.24628 

(.01501) 


.04135 

(.01153) 


499 


2.0 


.03470 

(.01656) 


.05672 

(.01568) 


-.34206 

(.01942) 


.07631 

(.01847) 


462 


5.0 


.24324 

(.04126) 


.52162 

(.04267) 


-.43114 

(.03196) 


-.37581 

(.07156) 


225 


7.0 


.50476 

(.06596) 


1.30581 

(.07867) 


-.39861 

(.04399) 


-.69076 

(.08469) 


185 


10.0 


1.15728 

(.12758) 


3.84602 

(.19504) 


-.25101 

(.07358) 


- 1.00000 

(.00000) 


174 


12.5 


2.06816 

(.21894) 


8.49575 

(.41214) 


-.02144 

(.11647) 


- 1.00000 

(.00000) 


174 


15.0 


3.52449 

(.37592) 


18.05457 

(.86673) 


.34887 

(.18841) 


- 1.00000 

(.00000) 


174 
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/\ 

The parametric estimate P p (t) uses the most correct information about the 
process. For N= 10, the parametric estimate is within three standard deviations of 
zero bias for t< 5. As t gets larger, the relative bias tends to increase. The parametric 
estimate understandably has the smallest relative bias for small t. For large t, the small 
sample sizes involved are probably responsible for the larger relative bias. For small 
times the renewal estimate and the Kaplan-Meicr estimate for the distribution of the 
first passage time to state 0 have about the same average relative bias. For small N 
and large t, the renewal estimate has large bias. As noted before, the renewal estimate 
will be biased if the largest observations of the sojourn times in a state are censored 

.A 

thus causing the Kaplan-Meier estimate F ; to be undefined. The bias could also be 
caused by the step size in the discrete time approximation (step size 0.01) being too 
large, or by numerical error in summing large quantities of small numbers, as 
mentioned earlier. The Kaplan-Meier estimate does well for small t and small N. As 
time increases, the number of data points depreciates rapidly. Because of the small 
number of subjects in each run, the Kaplan-Meier estimate of the distribution of the 
first passage time to state 0 lost over half its data due to undefined distributions. By 
time i=10, there are no survivors using the Kaplan-Meier estimate, resulting in the 
-1.0 average relative bias. From equation 2.23, the renewal estimate and the 
asymptotic estimate should be approximately the same for large t if the Kaplan-Meier 

A 

estimates F. ( are always defined. The asymptotic estimate is negatively biased for small 
t but changed over at /> 12.5. Once again, it could be biased due to censoring of the 
largest sojourn times. The asymptotic estimate has the smallest average relative bias 
for large time t. 

Figure 3.2a shows histograms of the relative bias for each of the four estimates 
when N=50 and at t=0.5. Each of the histograms again looks relatively normal. The 
distributions are much tighter when compared to Figure 3.1a. The parametric estimate 
has the tightest distribution and again the asymptotic estimate the worst. Figure 3.2b 
shows the relative bias for each estimate when N=50 and t=5.0. All the estimates 
except the Kaplan-Meier estimate look relatively normal with possibly a slight right 
skew. The Kaplan-Meier estimate of the first passage time to state 0 has just over two 
thirds of its distributions defined and is showing the start of an accumulation at -1.0 
due to the largest passage time to state 0 being less than 5.0. 
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Figure 3.2a Histograms of relative bias for N = 50 and t = 0.5. 
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RELATIVE BIAS FOR N=50 
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Figure 3.2b Histograms of relative bias for N= 50 and t= 5.0. 
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RELATIVE BIAS AT T-5.0 RELATIVE BIAS AT T=5.0 



TABLE IV 

AVERAGE RELATIVE BIAS 



Exponential Model N=50 (500 Reps) 



Time 


P p (t) 


P r (t) 


P.W 


P k (t) 


#KM 


.5 


-.00430 

(.00398) 


-.02928 

(.00455) 


-.12087 

(.00743) 


-.00082 

(.00512) 


500 


1.0 


-.00567 

(.00220) 


-.11500 

(.00278) 


-.12875 

(.00580) 


.00403 

(.00330) 


500 


2.0 


-.00364 

(.00705) 


-.28952 

(.00742) 


-.17511 

(.01087) 


-.00003 

(.00872) 


500 


5.0 


.03428 

(.01623) 


-.03348 

(.01946) 


-.29824 

(.01876) 


.01067 

(.04022) 


341 


7.0 


.08483 

(.02326) 


.56593 

(.03139) 


-.33861 

(.02347) 


-.35364 

(.07749) 


224 


10.0 


.19995 

(.03633) 


2.01703 

(.07707) 


-.35577 

(.03175) 


-.75760 

(.08671) 


188 


12.5 


.33775 

(.05091) 


4.36834 

(.16959) 


-.33S96 

(.04109) 


-.72166 

(.14855) 


186 


15.0 


.52199 

(.07067) 


9.22171 

(.36791) 


-.29647 

(.05409) 


-1.00000 

(.00000) 


184 



For N = 50, the parametric estimate again does well for small /. The average 
relative bias is within three standard deviations of zero for t<7. An improvement for 
large t for the parametric estimate is expected because of the increased number of 
individuals. The renewal estimate again has large bias for large t, though not as much. 
The asymptotic estimate is negatively biased throughout time. For t> 2, the bias looks 
constant. For N = 50, the Kaplan- Meier estimate of the first passage time to state 0 
has negligible average relative bias for t< 7. However as t increases, the Kaplan-Meier 
estimate looses an appreciable amount of its data due to undefined distributions. By 
t= 15, the Kaplan-Meier estimate has no survivors. Once again for large t (/= 15), the 
asymptotic estimate has the smallest average relative bias. 
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A simulation experiment is done for a case in which there is a relatively high 
number of individuals N=100; c = 0.5, p j = 1 , p 2 =l, 0 = 0.5. The results appear in 
* Table V. The increased number of individuals has decreased the average relative bias 
for all the estimates. The standard error of all the estimates has also decreased. From 
Tables III, IV, and V, it appears that as the number of observed individuals increase 
the average relative bias for all the estimates decrease. 



TABLE V 

AVERAGE RELATIVE BIAS 



Exponential Model N= 100 (500 Reps) 



Time 


Vo 


P r (t) 


P a W 


p k «) 


#KM 


.5 


-.00192 

(.00150) 


-.09574 

(.00251) 


-.09600 

(.00399) 


-.00379 

(.00240) 


500 


1.0 


-.00271 

(.00275) 


-.13534 

(.00318) 


-.08705 

(.00512) 


-.00235 

(.00367) 


500 


2.0 


-.00256 

(.00495) 


-.38676 

(.00426) 


-.10980 

(.00756) 


-.00267 

(.00609) 


500 


5.0 


.01258 

(.01136) 


-.69527 

(.00808) 


-.21245 

(.01378) 


.04573 

(.02474) 


448 


7.0 


.03523 

(.01588) 


-.51234 

(.01557) 


-.26003 

(.01729) 


-.15911 

(.05919) 


281 


10.0 


.08765 

(.02337) 


.28800 

(.04112) 


-.30374 

(.02242) 


-.47180 
(.1 1283) 


200 


12.5 


.14951 

(.03059) 


1.96444 

(.09357) 


-.32021 

(.02708) 


-.87080 

(.09137) 


185 


15.0 


.22930 

(.03909) 


5.69514 

(.19744) 


-.32195 

(.03245) 


-.87581 

(.12419) 


185 



A A 

In order to investigate the elFect of censoring on the values of P f (t) and P a (t) for 
large t, a simulation study is done in which the exponential censoring times has a mean 
of l/c= 1000. The other parameters are Pj=l, p,= l, and 0 = 0.5 as before. Once 
again the number of individuals are N= 10, 50. The results are presented in Tables VI 
and VII. The results in Table VI suggest that the effect of the small sample size 
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resulting from N=10 dominates the performance of all the estimates except for the 
Kaplan-Meier estimate for large t. The results of Table VII suggest that the method of 

A A A 

computing P f (t) is affecting its performance for large t since P f (t)/P a (t) ~ 1 as t-* c o. 
For N= 50, limited censoring has improved the average relative bias of all the estimates 
for large t. Somewhat surprisingly, with limited censoring the Kaplan-Meier estimate 
has almost the best average relative bias. However, the standard error of its estimate is 
larger than that of the other estimates'. Thus the Kaplan-Meier estimate tends to be 
more variable than the other estimates. 



TABLE VI 

AVERAGE RELATIVE BIAS 

Exponential Model N= 10 (Limited Censoring, c = 0.001) (500 Reps) 



Time 


a. 


P r (t) 


.5 


-.01895 

(.00360) 


-.01133 

(.00441) 


1.0 


-.03008 

(.00629) 


-.04266 

(.00644) 


2.0 


-.04178 

(.01058) 


-.08352 

(.00955) 


5.0 


-.01755 

(.02166) 


-.02716 

(.02091) 


7.0 


.04003 

(.02968) 


.05500 

(.03044) 


10.0 


.18364 

(.04466) 


.25792 

(.04946) 


12.5 


.36140 

(.06146) 


.53159 

(.07442) 


15.0 


.60294 

(.08414) 


.94267 

(.11360) 



P a «) 


p k «) 


#KM 


-.03S53 

(.00476) 


-.00768 

(.00718) 


500 


-.02281 

(.00614) 


-.00492 

(.01025) 


500 


-.02283 

(.01049) 


-.01816 

(.01461) 


500 


-.01934 

(.02207) 


.04571 

(.02963) 


500 


.02483 

(.03009) 


.09415 

(.03971) 


500 


.15076 

(.04481) 


.05896 

(.06421) 


500 


.31332 

(.06127) 


.03902 

(.09355) 


500 


.53732 

(.08351) 


-.00407 

(.13365) 


500 
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TABLE VII 

AVERAGE RELATIVE BIAS 



Exponential Model N=50 (Limited Censoring, c = 0.001) (500 Reps) 



Time 




P r (t) 


p a (o 


p k w 


#KM 


.5 


-.00470 

(.00144) 


-.04348 

(.00193) 


-.07253 

(.00267) 


.00380 

(.00313) 


500 


1.0 


-.00760 

(.00259) 


-.15247 

(.00285) 


-.03784 

(.00318) 


.00104 

(.00425) 


500 


2.0 


-.OHIO 

(.00452) 


-.40891 

(.00355) 


-.01343 

(.00464) 


.00562 

(.00645) 


500 


5.0 


-.01035 

(.00982) 


-.79859 

(.00278) 


-.00883 

(.00987) 


.00497 

(.01279) 


500 


7.0 


-.00123 

(.01340) 


-.79946 

(.00241) 


-.00283 

(.01350) 


-.00350 

(.01820) 


500 


10.0 


.02459 

(.01903) 


-.59136 

(.00335) 


.01905 

(.01922) 


-.01116 

(.02926) 


500 


12.5 


.05757 

(.02413) 


-.13724 

(.00679) 


.04945 

(.02440) 


-.08114 

(.04066) 


500 


15.0 


.10115 

(.02977) 


.84405 

(.01457) 


.09097 

(.03015) 


-.06882 

(.05631) 


500 



Below are reported simulation results experimenting with different parameter 
values of and c. For these studies, the number of individuals is set at N = 50 to 
reduce the effects of undefined Kaplan-Meier estimates of Fj. Four different cases are 
simulated. The sojourn time in state 1 is changed to reflect a higher and lower mean 
sojourn time and the censoring mean time is changed to reflect more or less censoring. 

The first cases that are simulated are the changes in the mean sojourn time in 
state 1. The other parameters are c = 0.5, p 2 = 1, and 0 = 0.5 as before. The mean 
sojourn time of state 1 is increased from 1 to 2 (Pj = 0.5) and decreased from 1 to 0.5 
(Pj = 2). With the increase in the mean sojourn time of state 1, the probability of a 
death being censored increases. For a decrease in the mean sojourn time, the opposite 
is true. There are quicker jumps out of state 1, resulting in more uncensored deaths. 
Tables VIII and IX show the computed average relative bias using equation 3.1 along 
with the associated standard error. 
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TABLE VIII 

AVERAGE RELATIVE BIAS 
Exponential Model N=50 (pj = 0.5) (500 Reps) 



Time 


Pp(0 


P r (t) 




p k (o 


#KM 


.5 


.00045 


-.00505 


-.14754 


.00062 


500 


(.00136) 


(.00188) 


(.00921) 


(.00234) 




1.0 


.00175 


-.02090 


-.18477 


.00707 


500 




(.00261) 


(.00326) 


(.01020) 


(.00366) 




2.0 


.00647 


-.06266 


-.25529 


.00877 


500 


(.00496) 


(.00597) 


(.01228) 


(.00616) 




5.0 


.03886 


.06045 


-.40236 


.07279 


415 


(.01198) 


(.01328) 


(.01702) 


(.02202) 




7.0 


.07539 


.25430 


-.45897 


-.04033 


255 




(.01717) 


(.02093) 


(.01959) 


(.05350) 




10.0 


.15287 


.76927 


-.50727 


-.65507 


152 




(.02643) 


(.04165) 


(.02366) 


(.07690) 




12.5 


.24079 


1.53886 


-.52490 


-.88639 


137 




(.03615) 


(.07217) 


(.02772) 


(.06656) 




15.0 


.35370 


2.79204 


-.52738 


-.86033 


137 




(.04857) 


(.12186) 


(.03276) 


(.09855) 




In Table VIII where 


the mean sojourn time in state 1 increases, the average 


relative bias 


of the parametric estimate 


looks about the 


same as in 


Table IV. The 



average relative bias of the renewal estimate is slightly better than in Table IV. The 
average relative bias of the asymptotic estimate looks like it increased, but is within 
three standard errors of Table IV. The average relative bias of the Kaplan-Meicr 
estimate of the first passage time to state 0 looks the same as in Table IV. The number 
of defined Kaplan-Mcier estimates has decreased due to the increase in the probability 
of a censored death as mentioned earlier. There are two survivors at t— 15. 

In Table IX where the mean sojourn time in state 1 decreases, the parametric, 
asymptotic, and Kaplan-Meier estimates have the same average relative bias as in 
Table IV. The number of defined Kaplan-Meier estimates has increased due to the 
decrease in the probability of a censored death. The renewal estimate has increased as 
compared to Table IV. 
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TABLE IX 

AVERAGE RELATIVE BIAS 



Exponential Model N = 50 (Pj = 2) (500 Reps) 



Time 


PpW 






P k (t) 


#KM 


.5 


-.00691 

(.00323) 


-.07878 

(.00369) 


-.11848 

(.00504) 


.00049 

(.00434) 


500 


1.0 


-.00907 

(.00537) 


-.22875 

(.00512) 


-.10384 

(.00753) 


-.00438 

(.006S5) 


500 


2.0 


-.00826 

(.00893) 


-.49325 

(.00721) 


-.16288 

(.01187) 


-.00674 

(.01211) 


500 


5.0 


.04037 

(.02060) 


-.16543 

(.02167) 


-.30815 

(.02132) 


-.08353 

(.05516) 


308 


7.0 


.11149 

(.03055) 


.95936 

(.04897) 


-.34515 

(.02811) 


-.61487 

(.08054) 


221 


10.0 


.28446 

(.05190) 


5.05635 

(.15416) 


-.34050 

(.04336) 


-.78435 

(.11361) 


207 


12.5 


.50537 

(.07987) 


12.93165 

(.40802) 


-.28668 

(.06490) 


-1.00000 

(.00000) 


207 


15.0 


.82543 

(.12395) 


31.25238 

(1.11064) 


-.17898 

(.10098) 


-1.00000 

(.00000) 


207 



The next cases that are simulated are the changes in the censoring distribution; 
the other parameters are p ( = 1, p,= 1, and 0 = 0.5 as before. The exponential mean 
time to censor is increased from 2 to 4 (c = 0.25) and decreased from 2 to 1 (c= 1). 
With an increase in the mean censoring time, the probability of a censored death 
decreases. With a decrease in the mean censoring time, the opposite is true. Tables X 
and XI show the average relative bias for each simulation along with the standard 
error. 

Table X where the mean censoring time increases (l/c = 4), falls between Table 
IV and Table VII. The average relative bias of the parametric estimate is worse than in 
the limited censoring case of Table VII but slightly better than in the case c = 0.5 of 
Table IV. The average relative bias of the renewal estimate is much worse than it is 
with the limited censoring of Table VI 1 but about the same to slightly better in the tail 
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than in the case c = 0.5 of Table IV. The average relative bias of the asymptotic 
estimate is about the same as the limited censoring case but much better than in the 
case c = 0.5. The average relative bias of the Kaplan-Meier estimate of the first passage 
time to state 0 is about the same for small to moderate times and worse for large times 
than in the limited censoring case of Table VII and better for large times than in the 
case c = 0.5 of Table IV. The number of defined Kaplan-Meier estimates is between the 
two tables, due to the increase in the mean censoring times. There are five survivors 
past t= 15. 



TABLE X 

AVERAGE RELATIVE BIAS 



Exponential Model N=50 (c=0.25) (500 Reps) 



Time 


VO 


P r (t) 


p a (0 


P k (t) 


#KM 


.5 


-.00099 

(.00179) 


-.03349 

(.00230) 


-.06677 

(.00331) 


.00170 

(.00325) 


500 


1.0 


-.00071 

(.00326) 


-.13166 

(.00348) 


-.04538 

(.00434) 


.00874 

(.00466) 


500 


2.0 


.00178 

(.00581) 


-.36077 

(.00525) 


-.04544 

(.00694) 


.00837 

(.00720) 


500 


5.0 


.02794 

(.01330) 


-.57486 

(.00892) 


-.09177 

(.01460) 


.01752 

(.02043) 


486 


7.0 


.06206 

(.01876) 


-.27543 

(.01723) 


-.10565 

(.01949) 


.06881 

(.04235) 


411 


10.0 


.13879 

(.02822) 


.85619 

(.03879) 


-.09864 

(.02727) 


-.10377 

(.09322) 


307 


12.5 


.22881 

(.03788) 


• 2.71999 
(.07635) 


-.06942 

(.03479) 


-.59309 

(.11491) 


276 


15.0 


.34587 

(.04987) 


5.92070 

(.15972) 


-.01964 

(.04386) 


-.69629 

(.16349) 


273 
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In Table XI where the mean censoring time decreases, the average relative bias of 
the parametric estimate is about the same for small to moderate times and then is 
worse for large times than in Table IV. The average relative bias of the renewal and 
asymptotic estimates are both worse for t> 2 due possibly to an increase in the number 
of dishonest Kaplan- Meier estimates of Fj. The average relative bias of the the 
Kaplan-Meier estimate of the first passage time to state 0 is worse for t> 5. The 
number of defined Kaplan-Meier estimates has decreased reflecting the decrease in the 
mean time to censoring. 



TABLE XI 

AVERAGE RELATIVE BIAS 



Exponential Model N = 50 (c=l) (500 Reps) 



Time 


P p (0 


■*-* 

u, 


p a (0 


P k (t) 


#KM 


.5 


-.00268 

(.00270) 


-.01350 

(.00317) 


-.27293 

(.00971) 


-.00012 

(.00369) 


500 


1.0 


-.00193 

(.00491) 


-.05220 

(.0056S) 


-.35347 

(.01171) 


.00391 

(.00610) 


500 


2.0 


.00809 

(.00878) 


-.07041 

(.01133) 


-.47848 

(.01452) 


-.00203 

(.01352) 


488 


5.0 


.10233 

(.02127) 


.54674 

(.03027) 


-.66059 

(.01795) 


-.63062 

(.07727) 


128 


7.0 


.21600 

(.03220) 


1.46539 

(.06044) 


-.70866 

(.02021) 


-.93652 

(.05174) 


113 


10.0 


.47564 

(.05588) 


4.48047 

(.15766) 


-.73703 

(.02540) 


- 1.00000 

(.00000) 


111 


12.5 


.80113 

(.08693) 


10.05152 

(.34001) 


-.73711 

(.03215) 


- 1.00000 

(.00000) 


111 


15.0 


1.26789 

(.13558) 


21.62486 

(.72038) 


-.72108 

(.04211) 


- 1.00000 

(.00000) 


111 
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C. ROBUSTNESS 



In the above simulations the maximum likelihood estimate used the known 
correct model. Often, a model needs to be chosen to describe a data set. Attempts are 
made to analyze the data to determine a good model. However, when sample sizes are 
small, the difficulty of finding a good model increases. Hence due to small sample sizes 
or ease of computation, an incorrect model may be chosen to describe a data set. In 
this section, the robustness of the estimates proposed in Chapter II is studied with 
respect to an incorrect model assumption concerning the sojourn time in state 1. 

The data for the simulation experiment in this section are generated from the 
following three state semi-Markov process: Individuals start in state 1 at t=0. The 
probability of a jump to state 0 is 0; to state 2 is 1-0. From state 2 the probability of a 
jump to state 1 is 1. State 0 is an absorbing state. The sojourn time in state 2 is 
exponential with mean l/p 2< The sojourn time in state 1 is the sum of two independent 
exponentials with means 1/Pj and l/p 2 ; that' is, the sojourn time in state 1 has a 
hypoexponential distribution. Censoring is independent and exponentially distributed 
with mean 1/c. The same basic Fortran program is employed, modified for the above 
change. The data generated are analyzed by the same Fortran subroutines for each 
estimate as in the first section. In particular, the (incorrect) maximum likelihood 
estimate of equation 2.9 is used. This maximum likelihood estimate assumes the 
sojourn time in state 1 has an exponential distribution rather than the true 
hypoexponential distribution. 

For the first simulation results reported, parameter values of Pj = 1, p 2 = 1, p 3 = 1, 
0 = 0.5, and c = 0.5 are used. Again, two dilFcrent numbers of observed individuals are 
used, 10 and 50. The simulation is replicated 500 times and the average relative bias is 
computed utilizing equation 3.1. For the Kaplan-Meier estimate, M is taken as the 
number of defined Kaplan-Meier estimates of the first passage time to state 0 by time 
t. For the others, M is the number of replications. The actual value of the survivor 
function is computed by inverting the Laplace transform of the passage time to state 0 
for the semi-Markov process. 
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Tables XII and XIII show the average relative bias of the hypoexponential 
model at selected values of t along with its associated standard error for N= 10 and 50. 
Again the right most column is the number of defined Kaplan-Meier estimates of the 
first passage time to state 0 out of 500 replications. 



TABLE XII 

AVERAGE RELATIVE BIAS 



Hypoexponential Model N=10 (500 Reps) 



Time 


PpW 


P r (t) 


p a «) 


p k « 


#KM 


.5 


-.04965 

(.00215) 


.00603 

(.00225) 


-.18749 

(.01511) 


.00434 

(.00332) 


500 


1.0 


-.04224 

(.00415) 


.00518 

(.00462) 


-.27115 

(.01478) 


.00119 

(.00622) 


499 


2.0 


.02344 

(.00829) 


.01920 

(.00994) 


-.35S21 

(.01670) 


.03759 

(.01170) 


4S8 


5.0 


.27429 

(.02259) 


.23906 

(.02654) 


-.45864 

(.02448) 


-.03008 

(.05463) 


253 


7.0 


.49966 

(.03501) 


.58490 

(.04261) 


-.46202 

(.03062) 


-.51308 

(.08072) 


164 


10.0 


.98106 

(.06148) 


1.53795 

(.08272) 


-.40812 

(.04310) 


-.90325 

(.05049) 


142 


12.5 


1.56115 

(.09466) 


2.93742 

(.14238) 


-.31726 

(.05S00) 


-.97945 

(.02055) 


140 


15.0 


2.37288 

(.14350) 


5.27240 

(.24293) 


-.18077 

(.07897) 


-1.00000 

(.00000) 


139 



In Table XII, for N= 10, the parametric estimate based on the incorrect model 
shows more relative bias for small t than the results in Table III using the correct 
maximum likelihood model. However, for moderate times l<t<7 the effect of the 
small number of individuals has overwhelmed the effect of the incorrect model and the 
relative bias is approximately the same as for the correct model given in Table III. 
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The average relative bias of the nonparametric estimates appear to do well. The 
renewal estimate and the Kaplan-Meier estimate of the first passage time to state 0 
seem to do very well for small times and about the same for moderate to large times, 
with the Kaplan-Meier decreasing to -1.0 at t= 15. The asymptotic estimate seems to 
do about the same as in the situation of Table III; it is still negatively biased and has 
the smallest average relative bias for large times. 

TABLE XIII 

AVERAGE RELATIVE BIAS 



Hypoexponential Model N = 50 (500 Reps) 



Time 


Pp(0 


P r (t) 




V) 


#KM 


.5 


-.04769 

(.00104) 


.00196 

(.00104) 


-.10608 

(.00701) 


-.00089 

(.00154) 


500 


1.0 


-.04010 

(.00203) 


-.00130 

(.00229) 


-.14888 

(.00745) 


.00136 

(.00297) 


500 


2.0 


.02140 

(.00415) 


-.02910 

(.00520) 


-.19231 

(.00965) 


.00557 

(.00563) 


500 


5.0 


.21701 

(.01169) 


.05244 

(.01291) 


-.29425 

(.01620) 


.03315 

(.02328) 


421 


7.0 


.36307 

(.01803) 


.20321 

(.01998) 


-.33710 

(.01987) 


-.09714 

(.05655) 


246 


10.0 


.63698 

(.03053) 


.64409 

(.03839) 


-.36628 

(.02550) 


-.53280 

(.09457) 


162 


12.5 


.92760 

(.04476) 


1.32451 

(.06729) 


-.36722 

(.03096) 


-.93675 

(.05242) 


150 


15.0 


1.29131 

(.06391) 


2.47259 

(.11708) 


-.35081 

(.03768) 


-.97947 

(.02053) 


149 



In Table XIII, the case of the larger number of individuals N = 50, the effect of 
the incorrect model of the maximum likelihood estimate has a more noticable effect on 
the average relative bias; the average relative bias for the parametric estimate is 
significantly higher than for the nonparametric renewal and Kaplan-Meier estimates for 
t^2. The nonparametric renewal and Kaplan-Meier estimates have about the same 
average relative bias for t<5. The average relative bias of the asymptotic estimate 
does the same as in Table IV, and still consistently negatively biased. The 
Kaplan-Meier estimate of the first passage time to state 0 does the same as in Table 
IV. There is still one survivor at t= 15. 
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A simulation experiment is done for the case in which there is a relatively high 
number of subjects N = 100; the other parameters are as before. Table XIV show the 
effects of the increase in observed individuals. The average relative bias of the 
parametric estimate shows less relative bias than in Table XIII but still significantly 
higher than Table V using the correct model with comparable number of subjects. The 
average relative bias of the nonparametric estimates has lower relative bias than Table 
XIII and about the same relative bias as Table V. Again it appears that the average 
relative bias for all the estimates decrease as the number of individuals increase. 



TABLE XIV 

AVERAGE RELATIVE BIAS 



Hypoexponential Model N = 100 (500 Reps) 



Time 


P p W 


P r (t) 


P.W 




#KM 


.5 


.00010 

(.02439) 


.04879 

(.02528) 


-.04324 

(.01982) 


.05034 

(.02493) 


500 


1.0 


.03696 

(.03963) 


.05200 

(.03902) 


-.05118 

(.03019) 


.07747 

(.03903) 


500 


2.0 


.1325S 

(.05843) 


-.05690 

(.04927) 


-.05490 .11607 

(.03941) (.05859) 


500 


5.0 


.18619 

(.00875) 


-.44921 

(.01064) 


-.21495 

(.01202) 


-.02024 

(.01586) 


491 


7.0 


.30873 

(.01312) 


-.18768 

(.01709) 


-.26080 

(.01481) 


-.04054 

(.03720) 


335 


10.0 


.52842 

(.02135) 


.50623 

(.02773) 


-.30765 

(.01864) 


-.47076 

(.08035) 


184 


12.5 


.75156 

(.03012) 


1.15119 

(.04520) 


-.33121 

(.02174) 


-.85091 

(.06960) 


160 


15.0 


1.01691 

(.04147) 


2.06055 

(.07748) 


-.34404 

(.02499) 


-.92400 

(.05621) 


158 



To study the effects of censoring for this model, a simulation experiment is done 
in which the exponential censoring times has a mean of l/c= 1000. The other 
parameters arc Pj = 1, p, = 1, p 3 = 1, and 0 = 0.5 as before. The number of individuals is 
N = 50. The results are shown in Table XV. The average relative bias of the parametric 
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estimate is higher for all times than Table VII using the correct model. The average 
relative bias of the nonparametric estimates are about the same as Table VII. Again, 
even with limited censoring, the average relative bias of the renewal estimate has 
computational problems for large t. 



TABLE XV 

AVERAGE RELATIVE BIAS 



Hypoexponential Model N = 50 (Limited Censoring, c = 0.001) (500 Reps) 



Time 


' P p (0 


p r » 


P.W 




#KM 


.5 


-.07105 

(.00060) 


-.00384 

(.00081) 


-.03344 

(.00157) 


.00197 

(.00136) 


500 


1.0 


-.08365 

(.00114) 


-.03442 

(.00156) 


-.04223 

(.00170) 


.00167 

(.00226) 


500 


2.0 


-.06224 

(.00224) 


-.19090 

(.00279) 


-.02480 

(.00254) 


.00162 

(.00391) 


500 


5.0 


-.00469 

(.00561) 


-.70151 

(.00294) 


-.00948 

(.00610) 


.00063 

(.00789) 


500 


7.0 


.02631 

(.00798) 


-.80405 

(.00221) 


-.01056 

(.00856) 


-.00350 

(.01105) 


• 500 


10.0 


.07973 

(.01182) 


-.75567 

(.00230) 


-.00598 

(.01231) 


.00391 

(.01615) 


500 


12.5 


.13071 

(.01535) 


-.59906 

(.00345) 


.00328 

(.01553) 


-.01794 

(.02164) 


500 


15.0 


.18831 

(.01925) 


-.31213 

(.00598) 


.01761 

(.01889) 


-.02465 

(.02797) 


500 



Two additional simulations are done using different hypoexponential distributions 
for the sojourn time in state 1. For these simulations the number of individuals is set at 
N = 50 for comparative purposes. The first simulation uses a hypoexponential 
distribution of = 1, p 2 = 1, p 3 = 0.1, 0 = 0.5, and c = 0.5. Table XVI shows the 
average relative bias and standard error for this model. 
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TABLE XVI 

AVERAGE RELATIVE BIAS 



Hypoexponential Model N = 50 (p 3 = 0.1) (500 Reps) 



Time 


Pp(0 


P r (t) 


P a (0 


P k (t) 


#KM 


.5 


-.01070 

(.00036) 


.00048 

(.00033) 


-.41461 

(.01923) 


-.00004 

(.00049) 


500 


1.0 


-.01378 

(.00071) 


-.00012 

(.00079) 


-.49394 

(.01835) 


-.00087 

(.00108) 


500 


2.0 


-.00851 

(.00141) 


-.00275 

(.00172) 


-.59801 

(.01730) 


-.00309 

(.00208) 


500 


5.0 


.03290 

(.00359) 


-.00058 

(.00541) 


-.74519 

(.01537) 


-.00397 

(.00716) 


492 


7.0 


.06534 

(.00512) 


.01743 

(.00817) 


-.79285 

(.01446) 


.01353 

(.01382) 


372 


10.0 


.11795 

(.00755) 


.08352 

(.01168) 


-.83604 

(.01349) 


-.08234 

(.04149) 


165 


12.5 


.16554 

(.00971) 


.17542 

(.01464) 


-.85869 

(.01295) 


-.39425 

(.07374) 


85 


15.0 


.21680 

(.01203) 


.28205 

(.01784) 


-.87462 

(.01258) 


-.69274 

(.08075) 


60 



Surprisingly, in Table XVI, the average relative bias of the. parametric estimate is 
significantly better than Table XIII. The survivor function of the first passage time to 
state 0 for the semi-Markov model having the sum of two exponentials with mean 1 
and 10 for the sojourn time in state 1 was computed. It was compared to the 
corresponding survivor function of the Markov model of Chapter II having 
exponential sojourn time in state 1 with mean 11. The parameters p 2 =l, and 0 = 0.5 
are as before for both models. For large t, the two survivor functions are 
approximately P(D> t} = exp[-0.045t] for the semi-Markov model and 
P(D > t) = exp[-0.043t] for the Markov model. Thus it appears that the small average 
relative bias for the parametric estimate in Table XVI is due to the closeness of the 
survivor functions for the two models. The average relative bias of the renewal 
estimate is significantly better than Tables XIII and IV. The average relative bias of 
the asymptotic estimate is significantly worse than Tables XIII and IV. A possible 
explanation for this is that with a mean sojourn time in state 1 of approximately 11 
and a mean censoring time of 2, the process either jumps to state 0 at first transition or 
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becomes censored due to the expected long sojourn time in state 1. Therefore, the 
Kaplan-Meier estimates for F ; will probably contain most of the probability mass at 
small times and relatively little mass for large times due to censoring, causing the 
Kaplan-Meier estimates of F ; to be unreliable for large times. The renewal and 
Kaplan-Meier estimates have approximately the same relative bias for t^7. The 
average relative bias of the Kaplan-Meier estimate of the first passage time to state 0 is 
about the same as Tables XIII and IV. For t^.12.5, the Kaplan-Meier estimate is 
significantly better than Tables XIII and IV, however, of the 500 replications only 85 
are still defined by t= 12.5 in Table XVI. 

The next simulation experiment uses a hypoexponential distribution for the 
sojourn time in state 1 that very closely resembles the exponential distribution used in 
Table IV. For this simulation, the parameter are p } = 1, p 2 = 1, p 3 = 100, 0 = 0.5, and 
c = 0.5. Again N=50 for comparison purposes. Table XVII show the average relative 
bias and standard error of the estimates at selected times. The average relative bias of 
all of the estimates are about the same as in Table IV as expected. 



TABLE XVII 

AVERAGE RELATIVE BIAS 
Hypoexponential Model N = 50 (p 3 = 100) (500 Reps) 



Time 


Pp(0 


P r (t) 


P a (0 


P k (t) 


#KM 


.5 


-.00825 

(.00213) 


-.02784 

(.00266) 


-.12817 

(.00574) 


-.00435 

(.00334) 


500 


1.0 


-.01009 

(.00387) 


-.11168 

(.00420) 


-.13732 

(.00743) 


-.00655 

(.00473) 


500 


2.0 


-.01009 

(.00684) 


-.30050 

(.00717) 


-.18659 

(.01074) 


-.00933 

(.00880) 


500 


5.0 


.02098 

(.01543) 


-.05434 

(.01872) 


-.31762 

(.01803) 


.02698 

(.03958) 


333 


7.0 


.06513 

(.02182) 


.52736 

(.03064) 


-.36331 

(.02223) 


-.18712 

(.08288) 


213 


10.0 


.16666 

(.03337) 


1.90635 

(.07764) 


-.38933 

(.02940) 


-.61354 

(.11659) 


166 


12.5 


.28695 

(.04577) 


4.30938 

(.16887) 


-.38193 

(.03724) 


-.79971 

(.14178) 


161 


15.0 


.44578 

(.06199) 


8.99799 

(.36461) 


-.35141 

(.04795) 


- 1.00000 

(.00000) 


160 
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IV. CONCLUSIONS 



From the results of Chapter III, it can be concluded: 

1) The maximum likelihood estimate uses the most assumptions about the model. 
It understandably does well when the model used is correct. It is the most 
sensitive to incorrect model assumptions. 

2) The renewal estimate and asymptotic estimate are biased by censoring of the 
last sojourn time in a state which makes the Kaplan-Meier estimate undefined. 
Further analysis could be done to investigate reasonable methods to make the 
Kaplan-Meier estimate honest. 

3) The asymptotic estimate has the smallest average relative bias for large times, 
r= 15. Flowever, the bias is always negative. Further analysis could be done to 
find a bias correction for it. 

4) The Kaplan-Meier estimate of the first passage time to state 0 uses the least 
knowledge. It does well for small times and moderate to large numbers of 
individuals. The Kaplan-Meier estimate and the renewal estimate appear to do 
about as well for small t. 

5) The larger the number of individuals the smaller the average relative bias is for 
all the estimates. 

6) The renewal estimate requires a great deal of computation. In view of the 
simulation results, one recommendation is to use the Kaplan-Meier estimate as 
long as not too many observations are censored, and then use the asymptotic 
estimate for larger times. The asymptotic estimate needs to be used with 
caution if the last sojourn times in state 1 or 2 are censored. 
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